Skip to content

feat: Script mode support for Estimator class #2834

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

evakravi
Copy link
Member

@evakravi evakravi commented Jan 7, 2022

Issue #, if available:

Description of changes:
This PR adds script mode support to the Estimator and EstimatorBase classes.

This was basically done by adding the parameters source_dir, git_config, hyperparameters, container_log_level, code_location, entry_point, dependencies to the Estimator and EstimatorBase classes.

Testing done:
The changes to the Estimator class do not break any existing unit tests. In addition, new unit tests were added to simulate the script mode use case for the Estimator class, and confirm that the calls to sagemaker.create_training_job() and s3 are the same for the Estimator class and Framework class when both use script mode. A test was also added to ensure that git support works with the Estimator class. Integration tests will be introduced in a subsequent PR.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I certify that the changes I am introducing will be backword compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • I used the commit message format described in CONTRIBUTING
  • I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • I have checked that my tests are not configured for a specific region or account (if appropriate)
  • I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov-commenter
Copy link

codecov-commenter commented Jan 7, 2022

Codecov Report

Merging #2834 (3351ce9) into master-jumpstart (167b723) will increase coverage by 0.00%.
The diff coverage is 92.38%.

Impacted file tree graph

@@                Coverage Diff                @@
##           master-jumpstart    #2834   +/-   ##
=================================================
  Coverage             89.16%   89.17%           
=================================================
  Files                   185      185           
  Lines                 16047    16069   +22     
=================================================
+ Hits                  14308    14329   +21     
- Misses                 1739     1740    +1     
Impacted Files Coverage Δ
src/sagemaker/algorithm.py 90.56% <80.00%> (ø)
src/sagemaker/estimator.py 90.73% <92.13%> (+0.19%) ⬆️
src/sagemaker/chainer/estimator.py 100.00% <100.00%> (ø)
src/sagemaker/huggingface/estimator.py 88.37% <100.00%> (ø)
src/sagemaker/model.py 88.81% <100.00%> (-0.11%) ⬇️
src/sagemaker/pytorch/estimator.py 94.44% <100.00%> (ø)
src/sagemaker/rl/estimator.py 97.76% <100.00%> (ø)
src/sagemaker/tensorflow/estimator.py 95.45% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 167b723...3351ce9. Read the comment docs.

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: f27d5a7
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: f27d5a7
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: f27d5a7
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: f27d5a7
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor

@JGuinegagne JGuinegagne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be able to add unit tests to verify that:

  • git_config is also handled correctly and similarly b7 DummyFramework and Estimator
  • hyperparameters are handled correctly (if not covered already)

@@ -437,6 +581,21 @@ def _get_or_create_name(self, name=None):
self._ensure_base_job_name()
return name_from_base(self.base_job_name)

@staticmethod
def _json_encode_hyperparameters(hyperparameters):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you are just refactoring, but is it possible to add args & return typings?

self._prepare_rules()
self._prepare_debugger_for_training()
self._prepare_profiler_for_training()

def _script_mode_hyperparam_update(self, code_dir, script):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment.


self._hyperparameters.update(EstimatorBase._json_encode_hyperparameters(hyperparams))

def _stage_user_code_in_s3(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment.

self.uploaded_code = self._stage_user_code_in_s3()
code_dir = self.uploaded_code.s3_prefix
script = self.uploaded_code.script_name
def _script_mode_hyperparam_update(self, code_dir, script):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment.

code_dir (str): The directory hosting the training scripts.
script (str): The relative filepath of the training entry-point script.
"""
hyperparams = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typing please:

hyperparams: Dict[str, str] = {}

@patch("sagemaker.estimator.Estimator._stage_user_code_in_s3")
def test_script_mode_estimator(patched_stage_user_code, sagemaker_session):
patched_stage_user_code.return_value = UploadedCode(
s3_prefix="s3://%s/%s" % ("bucket", "key"), script_name="script_name"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: are you trying to be consistent with the rest of the module when you use the ""%(*args) pattern?

If not, could you please use f-string?

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 1441e9f
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 1441e9f
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: 1441e9f
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 1441e9f
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@evakravi evakravi force-pushed the feat/script-mode-support-for-estimator-class branch from 1441e9f to 7295190 Compare January 19, 2022 19:13
@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 7295190
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: dcaef38
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: a3cabe8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 7295190
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: dcaef38
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: a3cabe8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: dcaef38
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: a3cabe8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 7295190
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: dcaef38
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor

@JGuinegagne JGuinegagne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few non-blocking comments/suggestions.

try to use either CodeCommit credential helper or local
credential storage for authentication.
hyperparameters (dict): Dictionary containing the hyperparameters to
initialize this estimator with.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Default: None)

If not specified, the default ``code location`` is s3://output_bucket/job-name/.
entry_point (str): Path (absolute or relative) to the local Python
source file which should be executed as the entry point to
training. If ``source_dir`` is specified, then ``entry_point``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Default: None)

@@ -437,6 +582,21 @@ def _get_or_create_name(self, name=None):
self._ensure_base_job_name()
return name_from_base(self.base_job_name)

@staticmethod
def _json_encode_hyperparameters(hyperparameters: dict) -> dict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you use Dict[str, Any] instead of dict for both the argument and the return type.

self._hyperparameters.update(EstimatorBase._json_encode_hyperparameters(hyperparams))

def _stage_user_code_in_s3(self) -> str:
"""Upload the user training script to s3 and return the location.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ...and return the S3 URI.

code_s3_prefix = "/".join(filter(None, [key_prefix, self._current_job_name, "source"]))

output_bucket, _ = parse_s3_url(self.output_path)
kms_key = self.output_kms_key if code_bucket == output_bucket else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking question: i know you are just refactoring this code, but is this a concern? i.e. are we conforming to customer expectation if we do not use the "output" encryption key when the script gets uploaded to a different bucket than the output bucket?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea, this section was directly lifted from somewhere else in the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will need to dive into this.

@@ -2376,6 +2690,10 @@ def _model_entry_point(self):

return None

def set_hyperparameters(self, **kwargs):
"""Sets hyperparameters."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid repeating function name in docstring. How about:

"""Escape the dict argument as JSON, update the private hyperparameter attribute."""

repack=self.source_dir
and self.entry_point
and not (self.key_prefix or self.git_config),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit for readability, i would vote for:

is_repack = self.source_dir and self.entry_point and not (self.key_prefix or self.git_config)
self._upload_code(deploy_key_prefix, repack=is_repack)

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 3351ce9
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 3351ce9
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 3351ce9
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@shreyapandit shreyapandit changed the title feat: script mode support for estimator class feat: Script mode support for Estimator class Jan 21, 2022
@shreyapandit shreyapandit merged commit d9d8c68 into aws:master-jumpstart Jan 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants